System and method for fibrechannel fail-over through port spoofing

ABSTRACT

In a system for appliance back-up, a primary appliance is coupled to a network, whereby the primary appliance receives requests or commands and sends a status message over the network to a standby appliance, which indicates that the primary appliance is operational. If the standby appliance does not receive the status message or the status message is invalid, the standby appliance writes a shutdown message to a storage device. The primary appliance then reads the shutdown message stored in the storage device and disables itself from processing requests or commands. When the primary appliance completes these tasks, it disables communication connections and writes a shutdown completion message to the storage device. The standby appliance reads the shutdown completion message from the storage device and initiates a start-up procedure. This procedure causes the address of the standby appliance to be identical to the primary appliance address, and the standby appliance processes the requests or commands in place of the primary appliance.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of U.S. patent applicationSer. No. 09/792,873, filed Feb. 23, 2001 now abandoned, entitled“Storage Area Network Using A Data Communication Protocol,” and is alsoa continuation-in-part of U.S. patent application Ser. No. 09/925,976,filed Aug. 9, 2001 now U.S. Pat. No. 7,093,127, entitled “System AndMethod For Computer Storage Security,” the disclosures of which areincorporated herein by reference.

BACKGROUND OF THE INVENTION

The present invention concerns “port spoofing,” which allows a computerto “fail over” to its secondary fibrechannel connection if its primaryfibrechannel connection should fail.

Fibrechannel is a network and channel communication technology thatsupports high-speed transmission of data between two points and iscapable of supporting many different protocols such as SCSI (SmallComputer Systems Interface) and IP (Internet Protocol). Computers,storage devices and other devices must contain a fibrechannel controlleror host adapter in order to communicate via fibrechannel. Unlikestandard SCSI cables, which can not extend more than 25 meters,fibrechannel cables can extend up to 10 km. The extreme cable lengthsallow devices to be placed far apart from each other, making it idealfor use in disaster recovery planning. Many companies use the technologyto connect their mass storage and backup devices to their servers andworkstations.

In addition to being able to protect data through disaster recoveryplans and backup, another requirement for a computer data communicationsnetwork is that the storage devices must always be available for datastorage and retrieval. This requirement is called “High Availability.”High Availability is a computer system configuration implemented withhardware and software such that, if a device fails, another device orsystem that can duplicate the functionality of the failed device willcome on-line to take its place automatically and transparently. Userswill not be aware that a failure and switch-over had taken place if thesystem is implemented properly. Many companies cannot afford to havedowntime on their computer systems for any length of time. Highavailability is used to ensure that their computer systems remainrunning continuously in the event of any device failure. Servers,storage devices, network switches and network connections are redundantand cross-connected to achieve High Availability. FIG. 1 shows a typicalprior art fibrechannel High Availability configuration.

In the configuration of FIG. 1, High Availability is achieved by firstcreating mirrored storage devices 145 and 150 and then establishingmultiple paths to the storage devices which are represented by thefibrechannel connections 105, 110, 125, 130, 135, and 140. Thisconfiguration allows the server 100 to continuously be able to store andretrieve its data, even if multiple failures have occurred, as long asone of its redundant hardware components or fibrechannel connectionsdoes not fail. For example, if paths 110 and 125 fail, the data trafficwill be routed through paths 105 and 140 to access storage device 150.Special software must be running on the server to detect the failuresand route the data through the working paths. The software is costly andrequires valuable memory and CPU processing time from the server tomanage the fail-over process.

SUMMARY OF THE INVENTION

The present invention is a system and method of achieving HighAvailability on fibrechannel data paths between an appliance'sfibrechannel switch and its storage device by employing a techniquecalled “port spoofing.” This system and method do not require anyproprietary software to be executing on the file/application applianceother than the software normally required on an appliance, whichincludes the operating system software, the applications, and thevendor-supplied driver to manage its fibrechannel host adapter(s).

The invention includes a system for appliance back-up, in which aprimary appliance is coupled to a network, whereby the primary appliancereceives requests or commands and sends a status message over thenetwork to a standby appliance, which indicates that the primaryappliance is operational. If the standby appliance does not receive thestatus message or the status message is invalid, the standby appliancewrites a shutdown message to a storage device, which is also coupled tothe network. The primary appliance then reads the shutdown messagestored in the storage device and disables itself from processingrequests or commands. Preferably, when the primary appliance completesthese tasks, it disables communication connections and writes a shutdowncompletion message to the storage device. The standby appliance readsthe shutdown completion message from the storage device and initiates astart-up procedure, which includes causing the address of the standbyappliance to be identical to the primary appliance address andprocessing the requests or commands in place of the primary appliance.The primary appliance can include a fibrechannel adapter havingassociated therewith the primary appliance address, and the standbyappliance can have a fibrechannel adapter having associated therewiththe standby appliance address. The standby appliance can include astandby application, which is identical to a primary application in theprimary appliance, for processing the requests or commands.

The invention also includes a method for appliance backup, whichincludes sending a status message from a primary appliance to a standbyappliance indicating that the primary appliance is operational. If thestandby appliance does not receive the status message or the statusmessage is invalid, a shutdown message is written to a storage device.The primary appliance reads the shutdown message stored in the storagedevice and is disabled from processing requests or commands. Thedisabling of the primary appliance can include completing tasks,disabling communication connections, and writing a shutdown completionmessage to the storage device. The standby appliance reads the shutdowncompletion message from the storage device and initiates a start-upprocedure so that a standby application, included in the standbyappliance, can process the requests or commands. A standby applianceaddress is changed to the primary appliance address and the standbyappliance processes the requests or commands.

Another method for appliance back-up is disclosed which includesmonitoring a primary appliance for an indication of a failure, theprimary appliance having a primary appliance address. If the failureoccurs, a message is written to a storage device and, in response, theprimary appliance is disabled from processing requests or commands. Thefailure can be the primary appliance not sending the status message to astandby appliance. The standby appliance has a standby applianceaddress, which is changed to the primary appliance address so thestandby appliance can processes the requests or commands. The standbyappliance address and the primary appliance address are world wide portnames. The monitoring can include sending a status message to thestandby appliance indicating that the primary appliance is operational,or sending a status request message to the primary appliance andreceiving an update status message from the primary appliance. Thefailure message is written if the standby appliance does not receive thestatus message or if the status message is invalid. Alternatively, themessage is written if the standby appliance does not receive the updatestatus message or the update status message is invalid. The disablingcan include completing tasks, disabling communication connections,writing a shutdown completion message to the storage device (by theprimary appliance), reading the shutdown completion message from thestorage device (by the standby appliance), and initiating a start-upprocedure. The standby appliance can include a standby application,which is identical to a primary application in the primary appliance,for processing the requests or commands.

One of the primary advantages of the present invention is thatadditional software is not required to be running on thefile/application server. Many system administrators prefer to onlyinstall the software that is necessary to run their file/applicationservers. Many other solutions require special software or drivers to runon the server in order to manage the fail-over procedure.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features and advantages of the invention will beapparent to those skilled in the art from the following detaileddescription of preferred embodiments, taken together with theaccompanying drawings, in which:

FIG. 1 is a block diagram of a prior art fibrechannel High Availabilitynetwork configuration;

FIG. 2 is a block diagram of the network configuration of the presentinvention;

FIG. 3 is a detailed block diagram of FIG. 2;

FIG. 4 is a block diagram showing a failed health monitor connection andthe method used to send a shutdown signal;

FIG. 5 is a flowchart showing the actions of the primary appliance andthe standby appliance when the health monitor link or primary applianceis non-functional;

FIG. 6 is a flowchart showing the actions of the standby appliance tobecome active; and

FIG. 7 is a block diagram showing more than one standby appliance.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is based on a software platform that creates astorage area network (“SAN”) for file and application servers to accesstheir data from a centralized location. A virtualized storageenvironment is created and file/application servers can access its datathrough a communication protocol such as Ethernet/IP, fibrechannel, orany other communication protocol that provides high-speed datatransmissions. Fibrechannel is the protocol that will be discussedherein, although it is understood that the other previously mentionedcommunication protocols are also within the scope of the presentinvention.

As mentioned before, computers, storage devices and other devicescontain a fibrechannel (FC) controller or host adapter in order tocommunicate via fibrechannel. In the present invention, FC hubs/switchesare used to connect file/application servers to servers that manage thestorage devices. Storage devices can be RAID (redundant array ofindependent disks) subsystems, JBODs Just a bunch of disks), or tapebackup devices, for example. An FC switch allows a server with afibrechannel host adapter to communicate with one or more fibrechanneldevices. Without a hub or switch, only a point-to-point or directconnection can be created, allowing only one server to communicate withonly one device. “Switch” thus refers to either a fibrechannel hub orswitch.

Fibrechannel adapters are connected together by fiber or copper wire viatheir FC port(s). Each port is assigned a unique address called a WWPNor “world wide port name.” The WWPN is a unique 64-bit identifierassigned by the hardware manufacturer and is used to establish thesource and destination between which data will travel. Therefore, whenan FC device communicates with another FC device, the initiating FCdevice, or “originator,” must use the second FC device's WWPN to locatethe device and establish the communication link.

Fibrechannel devices that are connected together by an FC switchcommunicate on a “fabric.” If a hub is employed, then the communicationlink is called a “loop.” On a fabric, devices receive the full bandwidthwhen they are communicating with each other, and on a loop the bandwidthis shared.

Although the manufacturers assign WWPN addresses, the addresses are notpermanently fixed to the hardware. The addresses can be changed.Software can programmatically change the WWPN addresses on thefibrechannel hardware. The present invention employs this feature bychanging the WWPN address on a standby FC adapter to the WWPN addressused by the failed FC adapter.

The present invention employs storage management software that iscapable of running within any kind of computing device that has at leastone CPU and is running an operating system. Examples of such computingdevices are an Intel®-based PC, a Sun® Microsystems Unix® server, an HP®Unix® server, an IBM® Unix® server or embedded systems (collectivelyreferred to as “appliances”). The software performs the writing,reading, management and protection of data from its file/applicationservers and workstations, and is disclosed with more specificity in U.S.patent application Ser. No. 09/792,873, filed Feb. 23, 2001, thedisclosure of which has already been expressly incorporated herein byreference. One of the protection features of the software is the abilityto “fail over” to another appliance if a set of defined failures occurs.The failures are defined and discussed in the following paragraphs.

More specifically, the present invention creates a transparent secondarypath for data to flow in the event that a primary data path to a storagedevice or storage server managing the primary path fails for any reason.The secondary path is a backup communication link to the same storagedevice. Each computer contains at least one FC host adapter connected toone FC switch. This operation is shown in FIG. 2, which includes SANclient 200, FC switch 210, storage server A 225, storage server B 230,and storage device 250. Attached to each storage server is an FCadapter—primary FC adapter 216 is attached to storage server A 225 andstandby FC adapter 217 is attached to storage server B 230. (There isalso an FC adapter, not shown, attached to SAN client 200.) The primarydata path consists of paths 205, 215, and 240, and the transparentsecondary data path consists of paths 220 and 245. The secondary path220 is a backup communication link to storage device 250. If primarypath 215 fails, storage server B 230 detects the failure and initiatesits standby FC adapter 217 to begin “spoofing” primary FC adapter 216 bycopying its identity and causing SAN client 200 to function with standbyFC adapter 217 in place of primary FC adapter 216. Data then flowthrough backup FC connection 220, through standby FC adapter 217, intostorage server B 230, and then to connection 245 to storage device 250.

FIG. 3 shows a more detailed view of FIG. 2. Two appliances, a primaryappliance 525 and a standby appliance 530 are running theabove-described software. The appliances can be computers, for example,personal computers, servers, or workstations. Standby appliance 530 is afail-over appliance. The two appliances 525, 530 are connected to thesame storage device 550 and to FC switch 510. The storage device 550 canbe any kind of device that stores data important enough to requireprotection from failure such as a hard disk, a RAID system, a CDROM, ora tape backup device. SAN client 500, which is a file/application serveror workstation, is configured with two separate data paths, a primarypath made up of paths 515 and 540, and a standby path, made up of paths520 and 545. Paths 515 and 520 always use a fibrechannelmedium/protocol, but paths 540 and 545 may use fibrechannel, or may usea different medium/protocol such as SCSI, IDE (Integrated DriveElectronics) or any other storage medium/protocol. Although one SANclient is shown in the example of FIG. 3, in an actual productionconfiguration, a primary appliance may manage the storage needs formultiple SAN clients. Data are actively transmitted bi-directionallyover primary data paths 515, 540 between SAN client 500, primaryappliance 525 and storage device 550 (as long as primary appliance 525and its paths 515 and 540 remain in good working order). No data will betransmitted bidirectionally over standby paths 520, 545 between SANclient 500 and storage device 550. However, standby appliance 530 may ormay not be data active (i.e., ready to receive or receiving data fromthe SAN client) depending on its configuration.

This standby appliance 530 can be implemented strictly as a fail-overappliance for one or more primary appliances. If its only function is tostandby, then standby appliance 530 must wait for one of the primaryappliances to fail so that it can become data active. If a standbyappliance 530 is a fail-over appliance for more than one primaryappliance 525, then it must contain one dedicated standby FC adapter 517for each primary appliance 525, and it must have a dedicated connectionto each storage device 550 that it might need to manage. Standbyappliance 530 itself can also be a primary appliance to its own set ofSAN clients and storage devices 550. The operations of being both aprimary and standby appliance are multitasked.

Standby appliance 530 monitors the status or the “health” of its primaryappliance 525 through a communications link called the health monitorlink 535. Messages called “fail-over heartbeats” are sent from standbyappliance 530 to primary appliance 525, and if the messages are properlyacknowledged the status of primary appliance 525 is acceptable. A“heartbeat” system is disclosed with more specificity in U.S. patentapplication Ser. No. 09/925,976, filed Aug. 9, 2001, entitled “SystemAnd Method For Computer Storage Security,” the disclosure of which hasalready been expressly incorporated herein by reference. If theheartbeat is not properly acknowledged or not acknowledged at all, thenstandby appliance 530 will begin the procedure for taking over the tasksof primary appliance 525. The heartbeat can also be implemented suchthat the heartbeat is sent from primary appliance 525 to standbyappliance 530; this simply is a choice based on the software'sarchitecture and ease of implementation. If a standby appliance 530 is afail-over appliance for multiple primaries, the communications link canbe configured to be shared among all primary appliances 525 or onededicated communications link can be connected from each primaryappliance 525 to standby appliance 530. The communications link can beany type of medium or protocol such as, for example, an Ethernet IPconnection, a fibrechannel connection or a serial connection. It is alsopossible that the health monitor can also function from standby FCadapter 517 along standby path 520 to monitor the status of the primaryappliance.

The health monitor link 535 performs several tasks:

-   -   1. It is used to monitor the status of the primary appliance.        The standby appliance sends a request for the primary        appliance's status. This is the heartbeat. The primary appliance        sends the status data to the standby appliance, and the data are        then analyzed. If a problem is discovered, the standby appliance        will instruct the primary appliance to shut down.    -   2. Health monitor link 535 is used to initially transfer all the        required information from the primary appliance to the standby        appliance that is needed to emulate the primary appliance in the        event that a fail-over event takes place when the standby        appliance was assigned as the fail-over appliance for the        primary appliance. This information includes the operating        parameters and data for the primary appliance and is static.        “Static” means that the parameters do not change during the        operation of the primary appliance. If the parameters are        changed due to new requirements and needs by the user, the        primary appliance will transfer the new information to the        standby appliance. An alternative implementation is that the        standby appliance is notified of the change and a request is        sent from the standby appliance to the primary appliance to        retrieve the new set of parameters. Currently the first method        is used (request from primary appliance to standby appliance)        but future implementations due to evolution of the fail-over        feature may require the latter method.    -   3. Health monitor link 535 is used to transfer any information        from the primary appliance to the standby appliance at the time        of fail-over if the primary appliance continues to run. This        information is used to help smooth the standby appliance's        fail-over process. This information is dynamic and is not        required by the standby appliance—the information is merely        helpful. The information is dynamic because its content is based        on its current operating state. The information is not required        because if the primary appliance failure were due to a system        crash, the standby appliance would not be able to receive this        information.    -   4. Health monitor link 535 is used by the primary appliance to        inform the standby appliance to begin taking over if the primary        appliance discovers a problem where it becomes necessary for the        primary appliance itself to initiate the fail-over process.    -   5. Health monitor link 535 is used by the standby appliance to        inform the primary appliance to shut itself down so that the        standby appliance can take over the primary appliance's tasks if        it detects over its health monitor link an imminent failure of        the primary appliance.    -   6. Health monitor link 535 is used by the standby appliance to        inform the primary appliance to resume its FC activities when        the primary appliance's failure has been fixed. The standby        appliance does this by maintaining its connection with the        primary appliance even though the primary appliance is no longer        active to receive or send commands and data. The primary        appliance continues to send status data to the standby        appliance. When the problem affecting the primary appliance has        been repaired, the standby appliance will be informed via the        status data, whereby the standby appliance will begin        de-activating itself from receiving additional commands and data        from the SAN client and will instruct the primary appliance to        begin its start-up procedure to resume receiving commands and        data from the SAN client once again.

Standby appliance 530 also takes over its primary appliance's tasks ifhealth monitor link 535 is broken or the heartbeat is not acknowledged.Health monitor link 535 may be broken due to a cut cable or “accidental”removal. The heartbeat may not be acknowledged because primary appliance525 loses power, crashes, or incurs another similar event. Although abroken link 535 does not affect the ability of primary appliance 525 toperform its tasks, primary appliance 525 will be regarded as a failedappliance nonetheless, and standby appliance 530 will take steps tobegin to take over the tasks from primary appliance 525. Since standbyappliance cannot communicate to primary appliance 525 to shut itselfdown, a backup method is used to pass on the shutdown signal.

FIG. 4 illustrates a failed health monitor connection 600 and the methodused to send a shutdown signal. Since primary appliance 605 and standbyappliance 610 are connected to the same storage device 630, storagedevice 630 will become the medium used to pass the shutdown signal toprimary appliance 605. A common file or a disk sector (or sectors) 625is reserved on the storage device 630. Primary appliance 605 monitorsthe common file or disk sector 625 at regular, pre-defined intervals forinstructions from standby appliance 610. If standby appliance 610detects no acknowledgement from its heartbeats or there is a brokenhealth monitor link, the standby appliance writes into common file 625an instruction for primary appliance 605 to begin its shutdownprocedures, which include completing outstanding tasks to itsapplication/file servers and/or workstation and disconnecting itselffrom the fibrechannel communication network. If primary appliance 605 isalive, which means that the health monitor link is corrupted, theprimary appliance reads the shutdown signal from the common file 625 andwrites an acknowledgement into the common file 625 that it has receivedthe shutdown signal and is beginning its shutdown procedure. Standbyappliance 610 then waits a pre-determined amount of time for a messageto come through the common file 625 from primary appliance 605 that thelatter has completed its shutdown procedure. Standby appliance 610monitors the common file 625 for the completion message during this timeinterval, and begins its start-up procedures as soon as the completionmessage is given. When the shutdown procedure is completed by primaryappliance 605, primary appliance then writes a shutdown completionmessage to common file 625, and standby appliance 610 begins itsprocedure to become active and take over the tasks of its failed primaryappliance 605. If standby appliance 610 does not receive a shutdowncompletion message from primary appliance 605 within a predeterminedtime interval, standby appliance 610 assumes that primary appliance 605has become totally inoperative and initiates its procedures to becomeactive to take over the tasks of the failed primary appliance 605. Sincecommon file 625 is used as a backup communication link between theappliances, it is also used to communicate any dynamic information fromthe primary appliance to the standby appliance that may be helpful tothe fail-over process. This information can be historical and/or stateinformation, which can be used during start-up procedures by eitherappliance. For example, if the primary appliance is turned off followedby the standby appliance being turned off, the standby appliance writesa message to the storage device indicating that it is no longeroperating in place of the primary server. If the primary applianceresumes operation before the standby appliance, the primary applianceknows from reading the message that it is to resume processing commandsand requests. As stated earlier, this information is not is required forthe fail-over process—it simply makes the process easier.

If primary appliance 605 initially becomes inoperative because of lossof power, system crash, or some other catastrophic event, standbyappliance 610 writes its shutdown message to the common file 625 withthe assumption that primary appliance 605 may still be active. Standbyappliance 610 functions in this manner because it cannot be assumed thatprimary appliance 605 is totally inoperative. A predetermined timeinterval is given by standby appliance 610 for primary appliance 605 torespond to the shutdown message, and if the shutdown message is notacknowledged standby appliance 610 begins its procedures to becomeactive to take over the tasks of the failed primary appliance 605.Standby appliance 610 monitors the common file 625 for the shutdownacknowledgement message, and as soon as this message is received standbyappliance 610 waits for the shutdown completion message.

FIG. 5 is a flowchart which describes the actions taken by primaryappliance 605 and standby appliance 610 when the health monitor link orprimary appliance is non-functional. Blocks 700 through 715 illustratethe steps undertaken by primary appliance 605. At block 700, primaryappliance 605 receives the shutdown message in common file 625 fromstandby appliance 610. Primary appliance 605 writes a shutdownacknowledgment message to common file 625 at block 705. At block 710,primary appliance 605 begins its shutdown procedure by completingoutstanding tasks and disabling its connections. Finally, at block 715,primary appliance 605 writes its shutdown completion message to commonfile 625.

Blocks 720 through 760 detail the steps employed by standby appliance610. At block 720, standby appliance 610 detects the lack of a responsefrom the health monitor link. In step 725, standby appliance 610 nextwrites the shutdown message to common file 625. The program proceeds toblocks 730 and 740 to wait for a shutdown acknowledgment message fromprimary appliance 605. Block 730, which queries whether the shutdownacknowledgment message has been received from primary appliance 605. Ifthe answer is “NO,” the program proceeds to decision block 740, whichqueries whether the predetermined time period has expired. If the answerat decision block 740 is “NO,” the program loops back to block 730. Ifthe answer at decision block 740 is “YES,” the program proceeds to block760 where standby appliance 610 begins procedures to become active andto take over the tasks of primary appliance 605. Returning to decisionblock 730, if the answer to the query is “YES,” the program proceeds toblocks 750 and 755 where standby appliance 610 waits for the shutdowncompletion message from primary appliance 605. In decision block 750,the program queries whether the shutdown completion message has beenreceived from primary appliance 605. If the answer is “NO,” the programproceeds to decision block 755, which queries whether the predeterminedtime period has expired. If the answer at decision block 755 is “NO,”the program loops back to block 750. If the answer at decision block 755is “YES,” the program proceeds to block 760 where standby appliance 610begins procedures to become active and to take over the tasks of primaryappliance 605. Returning to decision block 750, if the answer to thequery is “YES,” the program again proceeds to decision block 760, asdiscussed immediately above.

After the shutdown completion message is received or after the time hasexpired waiting for the shutdown acknowledgement or completion messages,the standby appliance begins its procedures to become active. From FIG.3, standby appliance 530 reprograms its standby FC adapter 517 with theWWPN address from primary FC adapter 516. Standby FC adapter 517 wasgiven a temporary WWPN address in order for it to be connected to thefibrechannel fabric. Standby appliance 530 knows the WWPN address of theprimary appliance because when standby appliance 530 was initiallyassigned to be the fail-over appliance for primary appliance 525, itcommunicated with primary appliance 525 to transfer all the necessaryinformation it needed to perform the emulation. This informationincluded the WWPN address of primary FC adapter 516.

A flowchart in FIG. 6 shows the steps taken by standby appliance 530. Atblock 800, standby appliance 610 initiates its activation procedures.Standby appliance 610 checks its connection at block 805 to ensurefunctionality. At block 810, standby appliance 610 retrieves the savedWWPN address of the FC adapter of failed primary appliance 605. Standbyappliance 610 reprograms its standby FC adapter with the new WWPNaddress at block 815. Finally, at block 820 standby appliance 610 isfunctionally able to manage storage for the SAN client of failed primaryappliance 605, in a manner transparent to the SAN client.

Once the WWPN address is programmed into standby FC adapter 517, SANclient 500 will not be aware of the change in appliances. Standbyappliance 530 will now receive all the data traffic that was bound forfailed primary appliance 525. When a standby appliance is a fail-overappliance for one or more than one primary appliances, a table is keptto store and keep track of the information needed to emulate the primaryappliances, which includes the WWPN addresses.

The technology of the present invention is not limited to one standbyappliance that can act as a fail-over to a set of primary appliances. Asillustrated in FIG. 7, the present invention also encompasses having astandby fail-over appliance 910 acting as a fail-over appliance toanother standby fail-over appliance 920. In this way, such multiplebackup systems protect businesses' computer and storage systems fromfailing.

It should be understood by those skilled in the art that the presentdescription is provided only by way of illustrative example and shouldin no manner be construed to limit the invention as described herein.Numerous modifications and alternate embodiments of the invention willoccur to those skilled in the art. Accordingly, it is intended that theinvention be limited only in terms of the following claims.

1. A system for appliance back-up comprising: a network; a storagedevice coupled to the network; and a primary appliance and a standbyappliance coupled to the network, the primary appliance receivingrequests or commands and sending a status message via the network to thestandby appliance indicating that the primary appliance is operational,wherein if the standby appliance does not receive the status message orthe status message is invalid: the standby appliance writes a shutdownmessage to a the storage device, the primary appliance reads theshutdown message stored in the storage device and disables itself fromprocessing requests or commands, and the standby appliance causes astandby appliance address to be identical to a primary appliance addressand processes the requests or commands.
 2. The system of claim 1,wherein the primary appliance completes tasks and disables communicationconnections.
 3. The system of claim 2, wherein the primary appliancewrites a shutdown completion message to the storage device.
 4. Thesystem of claim 3, wherein the standby appliance reads the shutdowncompletion message from the storage device and initiates a start-upprocedure.
 5. The system of claim 1, wherein the primary applianceincludes a primary application and the standby appliance includes astandby application, the standby application being identical to theprimary application.
 6. The system of claim 1, wherein the primaryappliance includes a first fibrechannel adapter having associatedtherewith the primary appliance address and the standby applianceincludes a second fibrechannel adapter having associated therewith thestandby appliance address.
 7. A method for appliance back-up comprising:sending a status message from a primary appliance to a standby applianceindicating that the primary appliance is operational; if the standbyappliance does not receive the status message or the status message isinvalid: writing a shutdown message to a storage device; reading theshutdown message stored in the storage device; disabling the primaryappliance from processing requests or commands; causing a standbyappliance address to be identical to a primary appliance address; andcausing the standby appliance to process the requests or commands. 8.The method of claim 7, wherein the disabling further comprisescompleting tasks and disabling communication connections.
 9. The methodof claim 7, wherein the disabling further comprises writing a shutdowncompletion message to the storage device.
 10. The method of claim 9,further comprising: reading the shutdown completion message from thestorage device; and initiating a start-up procedure.
 11. The method ofclaim 7, wherein the primary appliance includes a primary applicationand the standby appliance includes a standby application, identical tothe primary application, for processing the requests or commands.
 12. Amethod for appliance back-up comprising: monitoring a primary appliancefor an indication of a failure, the primary appliance having a primaryappliance address, wherein if the failure occurs: writing a message to astorage device; in response to the message, disabling the primaryappliance from processing requests or commands; causing a standbyappliance address of a standby appliance to be identical to the primaryappliance address; and processing the requests or commands.
 13. Themethod of claim 12, wherein the monitoring further comprises sending astatus message to the standby appliance indicating that the primaryappliance is operational.
 14. The method of claim 12, wherein themonitoring further comprises sending a status request message to theprimary appliance and receiving an update status message from theprimary appliance.
 15. The method of claim 13, wherein the failure isthe status message is not sent to the standby appliance.
 16. The methodof claim 13, wherein the message is written if the standby appliancedoes not receive the status message or the status message is invalid.17. The method of claim 16 wherein the disabling further comprisescompleting tasks and disabling communication connections.
 18. The methodof claim 17, wherein the disabling further comprises writing a shutdowncompletion message to the storage device.
 19. The method of claim 18,further comprising: reading the shutdown completion message from thestorage device; and initiating a start-up procedure.
 20. The method ofclaim 14, wherein the message is written if the standby appliance doesnot receive the update status message or the update status message isinvalid.
 21. The method of claim 20, wherein the disabling furthercomprises completing tasks and disabling communication connections. 22.The method of claim 21, wherein the disabling further comprises writinga shutdown completion message to the storage device.
 23. The method ofclaim 12, wherein the standby appliance address and the primaryappliance address are world wide port names.
 24. The method of claim 12,wherein the primary appliance includes a primary application and thestandby appliance includes a standby application, identical to theprimary application, for processing the requests or commands.
 25. Thesystem of claim 1, wherein: the standby appliance monitors the status ofthe primary appliance via a communications link; and the standbyappliance writes the shutdown message to the storage device if thecommunications link is broken.
 26. The method of claim 7, comprisingwriting the shutdown message to the storage device if a communicationslink between the standby appliance and the primary appliance is broken.27. A communications system, comprising: at least one storage device; afirst appliance configured to receive requests or commands forcommunicating with one or more of the at least one storage devices via afirst communications link, the first appliance having a first applianceaddress; and a second appliance configured to: transmit, at selectedtimes, messages to the first appliance via a second communications linkdifferent from the first communications link; wherein: the firstappliance is further configured to: communicate with one or more of theat least one storage devices in response to a received request orcommand; and in response to each message received from the secondappliance, provide an indication to the second appliance of a status ofthe first appliance via the second communications link; and the secondappliance is further configured to: monitor the status of the firstappliance based, at least in part, on the indications received from thefirst appliance; determine whether a proper indication is received inresponse to each message; assume an emulation address comprising thefirst appliance address in order to receive the requests or commandsaddressed to the first appliance, based, at least in part, on a failureto receive a proper indication; process the requests or commandsaddressed to the first appliance, after assuming the emulation address;continue to monitor the status of the first appliance, after assumingthe emulation address; if failure to receive a proper indication fromthe first appliance is due to a problem relating to the first appliance,determine that the problem has been resolved; and transmit to the firstappliance via the second communications link information directing thefirst appliance to resume receiving requests and commands directed tothe first appliance address, when the second appliance determines thatthe problem has been resolved; and the first appliance is furtherconfigured to resume receiving requests and commands directed to thefirst appliance address, in response to the information.
 28. The systemof claim 27, wherein the indication comprises a message.
 29. The systemof claim 27, wherein the indication comprises failure to receive themessage.
 30. The system of claim 27, wherein the status relates towhether the first appliance is operational.
 31. The system of claim 27,wherein: the first appliance and the second appliance communicate via alink.
 32. The system of claim 31, wherein: the second appliance isconfigured to send a heartbeat to the first appliance, via the link; andthe first appliance is configured to send the indication in response tothe heartbeat, via the link.
 33. The system of claim 27, wherein thesecond appliance is further configured to cause the first appliance todisable itself, based at least in part, on the indication.
 34. Thesystem of claim 33, wherein: the second appliance is configured to causethe first appliance to disable itself, by writing a message to one ofthe at least one storage devices.
 35. The system of claim 34, wherein:the second appliance is configured to write the message to the storagedevice if a communications link between the second appliance and thefirst appliance fails.
 36. The system of claim 33, wherein: the secondappliance is configured to cause the first appliance to disable itselfby informing the first appliance over the link.
 37. The system of claim33, wherein: the first appliance is configured to continue to provide anindication to the second appliance of the status of the first applianceafter being disabled; and the second appliance is further configured to:instruct the first appliance to begin a start-up procedure, based, atleast in part, on the indication, after disabling of the firstappliance.
 38. The system of claim 27, wherein: the first and secondappliances are coupled to a network.
 39. The system of claim 27, whereinthe second appliance stores information relating to the first address,before the second appliance determines that the first appliance is notoperational.
 40. The system of claim 27, wherein the first and secondappliance addresses comprise, at least in part, a worldwide port name.41. The system of claim 27, wherein the emulation address and the firstappliance address are the same.
 42. The system of claim 27, wherein: thefirst appliance comprises a first fibrechannel adapter having associatedtherewith the first appliance address; and the second appliancecomprises a second fibrechannel adapter having associated therewith thesecond appliance address.
 43. The communications system of claim 27,wherein the first appliance is further configured to: continue toprovide indications to the second appliance of the status of the firstappliance; and the second appliance is configured to determine that theproblem has been resolved based, at least in part, on the indications.44. A communications system, comprising: a network; at least one storagedevice; a first appliance coupled to the network via a firstcommunications link, to receive requests or commands for communicatingwith one or more of the at least one storage device, the first appliancehaving a first appliance address; and a second appliance coupled to thenetwork; wherein the first appliance is configured to: communicate withone or more of the at least one storage devices, based, at least inpart, on the requests or commands; and provide an indication to thesecond appliance indicating a status of the first appliance; and thesecond appliance is configured to: determine a status of the firstappliance, based, at least in part, on the indication; assume anemulation address comprising the first appliance address to receive therequests or commands directed to the first appliance, based at least inpart, on the indication; process the requests or commands addressed tothe first appliance after assuming the emulation address; cause thefirst appliance to disconnect itself from the network based at least inpart, on the second status; determine a second status of the firstappliance after the first appliance is disconnected from the network;and instruct the first appliance via a second communications linkdifferent from the first communications link, to connect itself to thenetwork based, at least in part, on the second status.
 45. The system ofclaim 44, wherein: the first appliance is configured to continue toprovide an indication to the second appliance of the second status ofthe first appliance; and the second appliance is further configured to:instruct the first appliance to begin a start-up procedure to resumereception and processing of requests or commands, based, at least inpart, on the indication.
 46. The system of claim 44, further comprising:a communications link between the first appliance and the secondappliance.
 47. The system of claim 46, wherein: the second appliance isconfigured to send a heartbeat to the first appliance, via the link; andthe first appliance is configured to send the indication in response tothe heartbeat, via the link.
 48. The system of claim 46, wherein thesecond appliance is configured to write a message to the storage deviceto cause the first appliance to disable itself, if the link is broken.49. The communications system of claim 44, wherein: the first applianceis configured to provide the indication to the second appliance via thesecond communications link.
 50. The communications system of claim 49,wherein: the second appliance causes the first appliance to disconnectitself from the network by instructing the first appliance via thesecond communications link.
 51. The communications system of claim 44,wherein: the second appliance causes the first appliance to disconnectitself from the network by instructing the first appliance via thesecond communications link.
 52. A system comprising a first deviceconfigured to process requests or commands received from a network, viaa first communications link, the first device having a first address;and a second device configured to: determine a status of the firstdevice; assume an emulation address including, at least in part, thefirst address, based, at least in part, on the determination; cause thefirst device to disconnect itself from the network based, at least inpart, on the determination; determine a second status of the firstdevice after the first device disconnects from the network; and instructthe first device via a second communications link different from thefirst communications link, to connect itself to the network based, atleast in part, on the second status.
 53. The system of claim 52, whereinthe second device is further figured to: process requests or commandsaddressed to the first device, after assuming the emulation address. 54.A method of operating a communications system comprising a firstappliance to process requests or commands received from a network via afirst communications link and a second appliance, the method comprising:determining by a second appliance a status of a first appliance;assuming by the second appliance an address associated with the firstappliance, based, at least in part, on the status; processing requestsor commands addressed to the first appliance, by the second appliance,after assuming the address; causing the first appliance to disconnectitself from the network based, at least in part, on the determination,by the second appliance; determining by the second appliance a secondstatus of the appliance after the first appliance is disconnected fromthe network; and instructing the first appliance via a secondcommunications link different from the first communications link, tobegin a start-up procedure to resume reception and processing ofrequests or commands based, at least in part, on the second status, bythe second appliance.
 55. The method of claim 54, comprising: assumingby the second appliance a same address as the first appliance.
 56. Themethod of claim 54, comprising: determining the status of the firstappliance based, at least in part, on an indication from the firstappliance.
 57. The method of claim 54, wherein the indication comprisesa message.
 58. The method of claim 54, wherein the indication comprisesfailure to receive a message.
 59. The method of claim 54, wherein: thefirst appliance and the second appliance communicate via a link.
 60. Themethod of claim 59, further comprising: sending a heartbeat between thefirst appliance and the second appliance, via the link; sending anacknowledgement of the heartbeat between the first appliance and thesecond appliance; and disabling the first appliance if either or both ofthe heartbeat or the acknowledgement are not received by the secondappliance.
 61. The method of claim 59, further comprising: detecting abreak in the link; and writing a message to a storage device to disablethe first appliance, if a break in the link is detected.
 62. The methodof claim 54, further comprising: receiving by the second appliance arequest or command addressed to the first appliance after the secondappliance assumes the address; and processing, by the second appliance,the request or command.
 63. The method system of claim 54, furthercomprising: disabling the first appliance, based, at least in part onthe indication.
 64. The method of claim 63, further comprising:continuing to receive an indication of the status of the first applianceby the second appliance, after causing the first appliance to disconnectitself from the network.
 65. A communications system, comprising: atleast one storage device; a first appliance to receive requests orcommands for communicating with one or more of the at least one storagedevices, the first appliance having a first appliance address; and asecond appliance; wherein the first appliance is configured to:communicate with one or more of the at least one storage devices inresponse to a received request or command; and provide an indication tothe second appliance of a status of the first appliance; and the secondappliance is configured to: determine a status of the first appliancebased, at least in part, on the indication; assume an emulation addresscomprising the first appliance address in order to receive the requestsor commands addressed to the first appliance, based, at least in part,on the indication; process the requests or commands addressed to thefirst appliance, after assuming the emulation address; and write amessage to one of the at least one storage devices to cause the firstappliance to disable itself, based at least in part, on the indication.66. The system of claim 65, wherein: the second appliance is configuredto write the message to the storage device if a communications linkbetween the second appliance and the first appliance fails.
 67. Thecommunications system of claim 65, wherein: the network comprises afibrechannel network.
 68. The communications system of claim 67,wherein: the first appliance includes a first fibrechannel adaptorhaving associated therewith the first appliance address; and the secondappliance includes a second fibrechannel adaptor having associatedtherewith the emulation address.
 69. The communications system of claim68, wherein: the first appliance address comprises a first world wideport name (“WWPN”).
 70. A communications system, comprising: a network;at least one storage device; a first appliance coupled to the network,to receive requests or commands for communicating with one or more ofthe at least one storage devices, the first appliance having a firstappliance address; a second appliance coupled to the network; and acommunications link between the first appliance and the second appliancewherein the first appliance is configured to: communicate with one ormore of the at least one storage devices, based, at least in part, onthe requests or commands; and provide an indication to the secondappliance indicating a status of the first appliance; and the secondappliance is configured to: determine a status of the first appliance,based, at least in part, on the indication; assume an emulation addresscomprising the first appliance address to receive the requests orcommands directed to the first appliance, based at least in part, on theindication; process the requests or commands addressed to the firstappliance after assuming the emulation address; and writing a message tothe storage device to cause the first appliance to disable itself fromprocessing requests or commands, if the link is broken.
 71. Thecommunications system of claim 70, wherein: the network comprises afibrechannel network.
 72. The communications system of claim 71,wherein: the first appliance includes a first fibrechannel adaptorhaving associated therewith the first appliance address; and the secondappliance includes a second fibrechannel adaptor having associatedtherewith the emulation address.
 73. The communications system of claim72, wherein: the first appliance address comprises a first world wideport name (“WWPN”).
 74. A communications system, comprising: at leastone storage device; a first appliance having a first appliance address,the first appliance being configured to: receive requests or commandsfor communicating with one or more of the at least one storage devicesvia a first communications link; and a second appliance configured to:transmit, at selected times, messages to the first appliance via asecond communications link different from the first communications link;and wherein: the first appliance is further configured to: communicatewith one or more of the at least one storage devices in response to areceived request or command; in response to each message received fromthe second appliance, provide an indication to the second appliance of astatus of the first appliance via the second communications link; andinform the second appliance, via the second communications link, of aproblem relating to an operation of the first appliance, if the firstappliance detects a problem relating to the operation of the firstappliance; and the second appliance is further configured to: assume anemulation address comprising the first appliance address in order toreceive the requests or commands addressed to the first appliance, ifinformed of a problem relating to the operation of the first appliance;process the requests or commands addressed to the first appliance, afterassuming the emulation address; and instruct the first appliance via thesecond communications link to begin a start-up procedure, if informedthat the problem has been repaired.
 75. The communications system ofclaim 74, wherein: the first appliance is further configured to informthe second appliance that the problem has been repaired.
 76. Thecommunications system of claim 75, wherein: the first appliance isfurther configured to inform the second appliance that the problem hasbeen repaired, via the second communications link.
 77. Thecommunications system of claim 74, wherein the second appliance isfurther configured to de-activate itself from receiving requests orcommands addressed to the first appliance, after instructing the firstappliance to begin the start-up procedure.